Outline

IPython and IPython Notebooks
Numpy
Pandas

Python and IPython

python is a programming language and also the name of the program that runs scripts written in that language.
If you're running scripts from the command line you can use either ipython with something like ipython my_script.py or python with something like python my_script.py
If you're using the command line interpreter interactively to load and explore data, try out a new package, etc. always use ipython over python. This is because ipython has a bunch of features like tab completion, inline help, and easy access to shell commands which are just plain great (more on these in a bit).

IPython Notebook

IPython notebook is an interactive front-end to ipython which lets you combine snippets of python code with explanations, images, videos, whatever.
It's also really convenient for conveying experimental results.
http://nbviewer.ipython.org

Notebook Concepts

Cells -- That grey box is called a cell. An IPython notebook is nothing but a series of cells.
Selecting -- You can tell if you have a cell selected because it will have a thin, black box around it.
Running a Cell -- Running a cell displays its output. You can run a cell by pressing shift + enter while it's selected (or click the play button toward the top of the screen).
Modes -- There are two different ways of having a cell selected:
- Command Mode -- Lets you delete a cell and change its type (more on this in a second).
- Edit Mode -- Lets you change the contents of a cell.

Aside: Keyboard Shortcuts That I Use A Lot

(When describing keyboard shortcuts, + means 'press at the same time', , means 'press after'
Enter -- Run this cell and make a new one after it
Esc -- Stop editing this cell
Option + Enter -- Run this cell and make a new cell after it (Note: this is OSX specific. Check help >> keyboard shortcuts to find your operating system's version)
Shift + Enter -- Run this cell and don't make a new one after it
Up Arrow and Down Arrow -- Navigate between cells (must be in command mode)
Esc, m, Enter -- Convert the current cell to markdown and start editing it again
Esc, y, Enter -- Convert the current cell to a code cell and start editing it again
Esc, d, d -- Delete the current cell
Esc, a -- Create a new cell above the current one
Esc, b -- Create a new cell below the current one
Command + / -- Toggle comments in Python code (OSX)
Ctrl + / -- Toggle comments in Python code (Linux / Windows)

Check more at [here](http://johnlaudun.org/20131228-ipython-notebook-keyboard-shortcuts/)

Numpy

Numpy is the main package that you'll use for doing scientific computing in Python. Numpy provides a multidimensional array datatype called ndarray which can do things like vector and matrix computations.

Resources:



In [1]:

    
# you don't have to rename numpy to np but it's customary to do so
import numpy as np

# you can create a 1-d array with a list of numbers
a = np.array([1, 4, 6])
print 'a:'
print a
print 'a.shape:', a.shape
print 

# you can create a 2-d array with a list of lists of numbers
b = np.array([[6, 7], [3, 1], [4, 0]])
print 'b:'
print b
print 'b.shape:', b.shape
print









    



a:
[1 4 6]
a.shape: (3,)

b:
[[6 7]
 [3 1]
 [4 0]]
b.shape: (3, 2)



In [2]:

    
# you can create an array of ones
print 'np.ones(3, 4):'
print np.ones((3, 4))
print

# you can create an array of zeros
print 'np.zeros(2, 5):'
print np.zeros((2, 5))
print

# you can create an array which of a range of numbers and reshape it
print 'np.arange(6):'
print np.arange(6)
print 
print 'np.arange(6).reshape(2, 3):'
print np.arange(6).reshape(2, 3)
print

# you can take the transpose of a matrix with .transpose or .T
print 'b and b.T:'
print b
print 
print b.T
print









    



np.ones(3, 4):
[[ 1.  1.  1.  1.]
 [ 1.  1.  1.  1.]
 [ 1.  1.  1.  1.]]

np.zeros(2, 5):
[[ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]]

np.arange(6):
[0 1 2 3 4 5]

np.arange(6).reshape(2, 3):
[[0 1 2]
 [3 4 5]]

b and b.T:
[[6 7]
 [3 1]
 [4 0]]

[[6 3 4]
 [7 1 0]]



In [3]:

    
# you can iterate over rows
i = 0
for this_row in b:
    print 'row', i, ': ', this_row
    i += 1 
print 
    
# you can access sections of an array with slices
print 'first two rows of the first column of b:'
print b[:2, 0]
print









    



row 0 :  [6 7]
row 1 :  [3 1]
row 2 :  [4 0]

first two rows of the first column of b:
[6 3]



In [4]:

    
# you can concatenate arrays in various ways:
print 'np.hstack([b, b]):'
print np.hstack([b, b])
print

print 'np.vstack([b, b]):'
print np.vstack([b, b])
print









    



np.hstack([b, b]):
[[6 7 6 7]
 [3 1 3 1]
 [4 0 4 0]]

np.vstack([b, b]):
[[6 7]
 [3 1]
 [4 0]
 [6 7]
 [3 1]
 [4 0]]



In [5]:

    
# note that you get an error if you pass in print 'np.hstack(b, b):'
print np.hstack(b, b)
print









    



---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-93b42dec95f7> in <module>()
      1 # note that you get an error if you pass in print 'np.hstack(b, b):'
----> 2 print np.hstack(b, b)
      3 print

TypeError: hstack() takes exactly 1 argument (2 given)



In [6]:

    
# you can perform matrix multiplication with np.dot()
c = np.dot(a, b)
print 'c = np.dot(a, b):'
print c
print

# if a is already a numpy array, then you can also use this chained 
# matrix multiplication notation.  use whichever looks cleaner in 
# context
print 'a.dot(b):'
print a.dot(b)
print


# you can perform element-wise multiplication with * 
d = b * b
print 'd = b * b:'
print d
print

a.dot(b)









    



c = np.dot(a, b):
[42 11]

a.dot(b):
[42 11]

d = b * b:
[[36 49]
 [ 9  1]
 [16  0]]







    Out[6]:





array([42, 11])

Arrays and Matrices

In addition to arrays which can have any number of dimensions, Numpy also has a matrix data type which always has exactly 2. DO NOT USE matrix.

The original intention behind this data type was to make Numpy feel a bit more like Matlab, mainly by making the * operator perform matrix multiplication so you don't have to use np.dot. But matrix isn't as well developed by the Numpy people as array is. matrix is slower and using it will sometimes throw errors in other people's code because everyone expects you to use array.



In [7]:

    
# you can convert a 1-d array to a 2-d array with np.newaxis
print 'a:'
print a
print 'a.shape:', a.shape
print 
print 'a[np.newaxis] is a 2-d row vector:'
print a[np.newaxis]
print 'a[np.newaxis].shape:', a[np.newaxis].shape
print

print 'a[np.newaxis].T: is a 2-d column vector:'
print a[np.newaxis].T
print 'a[np.newaxis].T.shape:', a[np.newaxis].T.shape
print









    



a:
[1 4 6]
a.shape: (3,)

a[np.newaxis] is a 2-d row vector:
[[1 4 6]]
a[np.newaxis].shape: (1, 3)

a[np.newaxis].T: is a 2-d column vector:
[[1]
 [4]
 [6]]
a[np.newaxis].T.shape: (3, 1)



In [8]:

    
# numpy provides a ton of other functions for working with matrices
m = np.array([[1, 2],[3, 4]])
m_inverse = np.linalg.inv(m)
print 'inverse of [[1, 2],[3, 4]]:'
print m_inverse
print

print 'm.dot(m_inverse):'
print m.dot(m_inverse)









    



inverse of [[1, 2],[3, 4]]:
[[-2.   1. ]
 [ 1.5 -0.5]]

m.dot(m_inverse):
[[  1.00000000e+00   0.00000000e+00]
 [  8.88178420e-16   1.00000000e+00]]



In [9]:

    
# and for doing all kinds of sciency type stuff.  like generating random numbers:
np.random.seed(5678)
n = np.random.randn(3, 4)
print 'a matrix with random entries drawn from a Normal(0, 1) distribution:'
print n









    



a matrix with random entries drawn from a Normal(0, 1) distribution:
[[-0.70978938 -0.01719118  0.31941137 -2.26533107]
 [-1.37745366  1.94998073 -0.56381007 -0.84373759]
 [ 0.22453858 -0.39137772  0.60550347 -0.68615034]]

Self-Driven Numpy Exercise

In the cell below, add a column of ones to the matrix X_no_constant. This is a common task in linear regression and general linear modeling and something that you'll have to be able to do later today.
Multiply your new matrix by the betas vector below to make a vector called y
You'll know you've got it when the cell prints '****** Tests passed! ******' at the bottom.

Specificically, given a matrix:

\begin{equation*} \qquad \mathbf{X_{NoConstant}} = \left( \begin{array}{ccc} x_{1,1} & x_{1,2} & \dots & x_{1,D} \\ x_{2,1} & x_{2,2} & \dots & x_{2,D} \\ \vdots & \vdots & \ddots & \vdots \\ x_{i,1} & x_{i,2} & \dots & x_{i,D} \\ \vdots & \vdots & \ddots & \vdots \\ x_{N,1} & x_{N,2} & \dots & x_{N,D} \\ \end{array} \right) \qquad \end{equation*}

We want to convert it to: \begin{equation*} \qquad \mathbf{X} = \left( \begin{array}{ccc} 1 & x_{1,1} & x_{1,2} & \dots & x_{1,D} \\ 1 & x_{2,1} & x_{2,2} & \dots & x_{2,D} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & x_{i,1} & x_{i,2} & \dots & x_{i,D} \\ \vdots & \vdots & \ddots & \vdots \\ 1 & x_{N,1} & x_{N,2} & \dots & x_{N,D} \\ \end{array} \right) \qquad \end{equation*}

So that if we have a vector of regression coefficients like this:

\begin{equation*} \qquad \beta = \left( \begin{array}{ccc} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_j \\ \vdots \\ \beta_D \end{array} \right) \end{equation*}

We can do this:

\begin{equation*} \mathbf{y} \equiv \mathbf{X} \mathbf{\beta} \end{equation*}



In [14]:

    
a = np.ones(n_data)[np.newaxis].T
a









    Out[14]:





array([[ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.]])



In [16]:

    
np.random.seed(3333)
n_data = 10 # number of data points. i.e. N
n_dim = 5   # number of dimensions of each datapoint.  i.e. D

betas = np.random.randn(n_dim + 1)

X_no_constant = np.random.randn(n_data, n_dim)
print 'X_no_constant:'
print X_no_constant
print 

# INSERT YOUR CODE HERE!
X = np.hstack([np.ones(n_data)[np.newaxis].T, X_no_constant])
y = np.dot(X, betas)

# Tests:
y_expected = np.array([-0.41518357, -9.34696153, 5.08980544, 
                       -0.26983873, -1.47667864, 1.96580794, 
                       6.87009791, -2.07784135, -0.7726816, 
                       -2.74954984])
np.testing.assert_allclose(y, y_expected)
print '****** Tests passed! ******'









    



X_no_constant:
[[-0.92232935  0.27352359 -0.86339625  1.43766044 -1.71379871]
 [ 0.179322   -0.89138595  2.13005603  0.51898975 -0.41875106]
 [ 0.34010119 -1.07736609 -1.02314142 -1.02518535  0.40972072]
 [ 1.18883814  1.01044759  0.3108216  -1.17868611 -0.49526331]
 [-1.50248369 -0.196458    0.34752922 -0.79200465 -0.31534705]
 [ 1.73245191 -1.42793626 -0.94376587  0.86823495 -0.95946769]
 [-1.07074604 -0.06555247 -2.17689578  1.58538804  1.81492637]
 [-0.73706088  0.77546031  0.42653908 -0.51853723 -0.53045538]
 [ 1.09620536 -0.69557321  0.03080082  0.25219596 -0.35304303]
 [-0.93971165  0.04448078  0.04273069  0.4961477  -1.7673568 ]]

****** Tests passed! ******

Pandas

Pandas is a python package which adds some useful data analysis features to numpy arrays. Most importantly, it contains a DataFrame data type like the r dataframe: a set of named columns organized into something like a 2d array. Pandas is great.

Resources:



In [19]:

    
# like with numpy, you don't have to rename pandas to pd, but it's customary to do so
import pandas as pd

b = np.array([[6, 7], [3, 1], [4, 0]])
df = pd.DataFrame(data=b,  columns=['Weight', 'Height'])
print 'b:'
print b
print 
print 'DataFame version of b:'
print df
print









    



b:
[[6 7]
 [3 1]
 [4 0]]

DataFame version of b:
   Weight  Height
0       6       7
1       3       1
2       4       0



In [20]:

    
# Pandas can save and load CSV files.  
# Python can do this too, but with Pandas, you get a DataFrame 
# at the end which understands things like column headings
baseball = pd.read_csv('data/baseball.dat.txt')

# A Dataframe's .head() method shows its first 5 rows
baseball.head()









    Out[20]:






  
    
      
      Salary
      AVG
      OBP
      Runs
      Hits
      Doubles
      Triples
      HR
      RBI
      Walks
      SO
      SB
      Errs
      free agency eligibility
      free agent in 1991/2
      arbitration eligibility
      arbitration in 1991/2
      Name
    
  
  
    
      0
      3300
      0.272
      0.302
      69
      153
      21
      4
      31
      104
      22
      80
      4
      3
      1
      0
      0
      0
      Andre Dawson
    
    
      1
      2600
      0.269
      0.335
      58
      111
      17
      2
      18
      66
      39
      69
      0
      3
      1
      1
      0
      0
      Steve Buchele
    
    
      2
      2500
      0.249
      0.337
      54
      115
      15
      1
      17
      73
      63
      116
      6
      5
      1
      0
      0
      0
      Kal Daniels
    
    
      3
      2475
      0.260
      0.292
      59
      128
      22
      7
      12
      50
      23
      64
      21
      21
      0
      0
      1
      0
      Shawon Dunston
    
    
      4
      2313
      0.273
      0.346
      87
      169
      28
      5
      8
      58
      70
      53
      3
      8
      0
      0
      1
      0
      Mark Grace



In [22]:

    
# you can see all the column names
print 'baseball.keys():'
print baseball.keys()
print

# print 'baseball.Salary:'
# print baseball.Salary
# print 
# print "baseball['Salary']:"
# print baseball['Salary']









    



baseball.keys():
Index([u'Salary', u'AVG', u'OBP', u'Runs', u'Hits', u'Doubles', u'Triples',
       u'HR', u'RBI', u'Walks', u'SO', u'SB', u'Errs',
       u'free agency eligibility', u'free agent in 1991/2',
       u'arbitration eligibility', u'arbitration in 1991/2', u'Name'],
      dtype='object')



In [23]:

    
baseball.info()









    



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 337 entries, 0 to 336
Data columns (total 18 columns):
Salary                     337 non-null int64
AVG                        337 non-null float64
OBP                        337 non-null float64
Runs                       337 non-null int64
Hits                       337 non-null int64
Doubles                    337 non-null int64
Triples                    337 non-null int64
HR                         337 non-null int64
RBI                        337 non-null int64
Walks                      337 non-null int64
SO                         337 non-null int64
SB                         337 non-null int64
Errs                       337 non-null int64
free agency eligibility    337 non-null int64
free agent in 1991/2       337 non-null int64
arbitration eligibility    337 non-null int64
arbitration in 1991/2      337 non-null int64
Name                       337 non-null object
dtypes: float64(2), int64(15), object(1)
memory usage: 47.5+ KB



In [24]:

    
baseball.describe()









    Out[24]:






  
    
      
      Salary
      AVG
      OBP
      Runs
      Hits
      Doubles
      Triples
      HR
      RBI
      Walks
      SO
      SB
      Errs
      free agency eligibility
      free agent in 1991/2
      arbitration eligibility
      arbitration in 1991/2
    
  
  
    
      count
      337.000000
      337.000000
      337.000000
      337.000000
      337.000000
      337.000000
      337.000000
      337.000000
      337.000000
      337.000000
      337.000000
      337.000000
      337.000000
      337.000000
      337.000000
      337.000000
      337.000000
    
    
      mean
      1248.528190
      0.257825
      0.323973
      46.697329
      92.833828
      16.673591
      2.338279
      9.097923
      44.020772
      35.017804
      56.706231
      8.246291
      6.771513
      0.397626
      0.115727
      0.192878
      0.029674
    
    
      std
      1240.013309
      0.039546
      0.047132
      29.020166
      51.896322
      10.452001
      2.543336
      9.289934
      29.559406
      24.842474
      33.828784
      11.664782
      5.927490
      0.490135
      0.320373
      0.395145
      0.169938
    
    
      min
      109.000000
      0.063000
      0.063000
      0.000000
      1.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      1.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      25%
      230.000000
      0.238000
      0.297000
      22.000000
      51.000000
      9.000000
      0.000000
      2.000000
      21.000000
      15.000000
      31.000000
      1.000000
      3.000000
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      50%
      740.000000
      0.260000
      0.323000
      41.000000
      91.000000
      15.000000
      2.000000
      6.000000
      39.000000
      30.000000
      49.000000
      4.000000
      5.000000
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      75%
      2150.000000
      0.281000
      0.354000
      69.000000
      136.000000
      23.000000
      3.000000
      15.000000
      66.000000
      49.000000
      78.000000
      11.000000
      9.000000
      1.000000
      0.000000
      0.000000
      0.000000
    
    
      max
      6100.000000
      0.457000
      0.486000
      133.000000
      216.000000
      49.000000
      15.000000
      44.000000
      133.000000
      138.000000
      175.000000
      76.000000
      31.000000
      1.000000
      1.000000
      1.000000
      1.000000



In [26]:

    
#  baseball



In [34]:

    
# You can perform queries on your data frame.  
# This statement gives you a True/False vector telling you 
# whether the player in each row has a salary over $1 Million
millionaire_indices = baseball['Salary'] > 1000
# print millionaire_indices



In [28]:

    
# you can use the query indices to look at a subset of your original dataframe
print 'baseball.shape:', baseball.shape
print "baseball[millionaire_indices].shape:", baseball[millionaire_indices].shape









    



baseball.shape: (337, 18)
baseball[millionaire_indices].shape: (139, 18)



In [33]:

    
# you can look at a subset of rows and columns at the same time
print "baseball[millionaire_indices][['Salary', 'AVG', 'Runs', 'Name']]:"
baseball[millionaire_indices][['Salary', 'AVG', 'Runs', 'Name']].head()









    



baseball[millionaire_indices][['Salary', 'AVG', 'Runs', 'Name']]:






    Out[33]:






  
    
      
      Salary
      AVG
      Runs
      Name
    
  
  
    
      0
      3300
      0.272
      69
      Andre Dawson
    
    
      1
      2600
      0.269
      58
      Steve Buchele
    
    
      2
      2500
      0.249
      54
      Kal Daniels
    
    
      3
      2475
      0.260
      59
      Shawon Dunston
    
    
      4
      2313
      0.273
      87
      Mark Grace

Pandas Joins - If you have time

The real magic with a Pandas DataFrame comes from the merge method which can match up the rows and columns from two DataFrames and combine their data. Let's load another file which has shoesize for just a few players



In [30]:

    
# load shoe size data
shoe_size_df = pd.read_csv('data/baseball2.dat.txt')
shoe_size_df









    Out[30]:






  
    
      
      Shoe Size
      Name
    
  
  
    
      0
      11
      Andre Dawson
    
    
      1
      13
      Mark Grace
    
    
      2
      12
      Sammy Sosa



In [31]:

    
merged = pd.merge(baseball, shoe_size_df, on=['Name'])
merged









    Out[31]:






  
    
      
      Salary
      AVG
      OBP
      Runs
      Hits
      Doubles
      Triples
      HR
      RBI
      Walks
      SO
      SB
      Errs
      free agency eligibility
      free agent in 1991/2
      arbitration eligibility
      arbitration in 1991/2
      Name
      Shoe Size
    
  
  
    
      0
      3300
      0.272
      0.302
      69
      153
      21
      4
      31
      104
      22
      80
      4
      3
      1
      0
      0
      0
      Andre Dawson
      11
    
    
      1
      2313
      0.273
      0.346
      87
      169
      28
      5
      8
      58
      70
      53
      3
      8
      0
      0
      1
      0
      Mark Grace
      13
    
    
      2
      200
      0.203
      0.240
      39
      64
      10
      1
      10
      33
      14
      96
      13
      6
      0
      0
      0
      0
      Sammy Sosa
      12



In [32]:

    
merged_outer = pd.merge(baseball, shoe_size_df, on=['Name'], how='outer')
merged_outer.head()









    Out[32]:






  
    
      
      Salary
      AVG
      OBP
      Runs
      Hits
      Doubles
      Triples
      HR
      RBI
      Walks
      SO
      SB
      Errs
      free agency eligibility
      free agent in 1991/2
      arbitration eligibility
      arbitration in 1991/2
      Name
      Shoe Size
    
  
  
    
      0
      3300
      0.272
      0.302
      69
      153
      21
      4
      31
      104
      22
      80
      4
      3
      1
      0
      0
      0
      Andre Dawson
      11.0
    
    
      1
      2600
      0.269
      0.335
      58
      111
      17
      2
      18
      66
      39
      69
      0
      3
      1
      1
      0
      0
      Steve Buchele
      NaN
    
    
      2
      2500
      0.249
      0.337
      54
      115
      15
      1
      17
      73
      63
      116
      6
      5
      1
      0
      0
      0
      Kal Daniels
      NaN
    
    
      3
      2475
      0.260
      0.292
      59
      128
      22
      7
      12
      50
      23
      64
      21
      21
      0
      0
      1
      0
      Shawon Dunston
      NaN
    
    
      4
      2313
      0.273
      0.346
      87
      169
      28
      5
      8
      58
      70
      53
      3
      8
      0
      0
      1
      0
      Mark Grace
      13.0

Self-Driven Pandas Exercise

Partner up with someone next to you. Then, on one of your computers:
1. Prepend a column of ones to the dataframe X_df below. Name the new column 'const'.
2. Again, matrix multiply X_df by the betas vector and assign the result to an new variable: y_new
3. You'll know you've got it when the cell prints '****** Tests passed! ******' at the bottom.
Hint: This stackoverflow post may be useful: http://stackoverflow.com/questions/13148429/how-to-change-the-order-of-dataframe-columns



In [36]:

    
np.random.seed(3333)
n_data = 10 # number of data points. i.e. N
n_dim = 5   # number of dimensions of each datapoint.  i.e. D

betas = np.random.randn(n_dim + 1)

X_df = pd.DataFrame(data=np.random.randn(n_data, n_dim))

# INSERT YOUR CODE HERE!
X_df['const'] = np.ones(n_data)
y_new = np.dot(X_df, betas)

# Tests:
assert 'const' in X_df.keys(), 'The new column must be called "const"'
assert np.all(X_df.shape == (n_data, n_dim+1))
assert len(y_new == n_data)
print '****** Tests passed! ******'









    



****** Tests passed! ******



In [37]:

    
X_df



In [ ]:

	Salary	AVG	OBP	Runs	Hits	Doubles	Triples	HR	RBI	Walks	SO	SB	Errs	free agency eligibility	free agent in 1991/2	arbitration eligibility	Name
0	3300	0.272	0.302	69	153	21	4	31	104	22	80	4	3	1	0	0	Andre Dawson
1	2600	0.269	0.335	58	111	17	2	18	66	39	69	0	3	1	1	0	Steve Buchele
2	2500	0.249	0.337	54	115	15	1	17	73	63	116	6	5	1	0	0	Kal Daniels
3	2475	0.260	0.292	59	128	22	7	12	50	23	64	21	21	0	0	1	Shawon Dunston
4	2313	0.273	0.346	87	169	28	5	8	58	70	53	3	8	0	0	1	Mark Grace

	Salary	AVG	OBP	Runs	Hits	Doubles	Triples	HR	RBI	Walks	SO	SB	Errs	free agency eligibility	free agent in 1991/2	arbitration eligibility	arbitration in 1991/2
count	337.000000	337.000000	337.000000	337.000000	337.000000	337.000000	337.000000	337.000000	337.000000	337.000000	337.000000	337.000000	337.000000	337.000000	337.000000	337.000000	337.000000
mean	1248.528190	0.257825	0.323973	46.697329	92.833828	16.673591	2.338279	9.097923	44.020772	35.017804	56.706231	8.246291	6.771513	0.397626	0.115727	0.192878	0.029674
std	1240.013309	0.039546	0.047132	29.020166	51.896322	10.452001	2.543336	9.289934	29.559406	24.842474	33.828784	11.664782	5.927490	0.490135	0.320373	0.395145	0.169938
min	109.000000	0.063000	0.063000	0.000000	1.000000	0.000000	0.000000	0.000000	0.000000	0.000000	1.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
25%	230.000000	0.238000	0.297000	22.000000	51.000000	9.000000	0.000000	2.000000	21.000000	15.000000	31.000000	1.000000	3.000000	0.000000	0.000000	0.000000	0.000000
50%	740.000000	0.260000	0.323000	41.000000	91.000000	15.000000	2.000000	6.000000	39.000000	30.000000	49.000000	4.000000	5.000000	0.000000	0.000000	0.000000	0.000000
75%	2150.000000	0.281000	0.354000	69.000000	136.000000	23.000000	3.000000	15.000000	66.000000	49.000000	78.000000	11.000000	9.000000	1.000000	0.000000	0.000000	0.000000
max	6100.000000	0.457000	0.486000	133.000000	216.000000	49.000000	15.000000	44.000000	133.000000	138.000000	175.000000	76.000000	31.000000	1.000000	1.000000	1.000000	1.000000

	0	1	2	3	4	const
0	-0.922329	0.273524	-0.863396	1.437660	-1.713799	1.0
1	0.179322	-0.891386	2.130056	0.518990	-0.418751	1.0
2	0.340101	-1.077366	-1.023141	-1.025185	0.409721	1.0
3	1.188838	1.010448	0.310822	-1.178686	-0.495263	1.0
4	-1.502484	-0.196458	0.347529	-0.792005	-0.315347	1.0
5	1.732452	-1.427936	-0.943766	0.868235	-0.959468	1.0
6	-1.070746	-0.065552	-2.176896	1.585388	1.814926	1.0
7	-0.737061	0.775460	0.426539	-0.518537	-0.530455	1.0
8	1.096205	-0.695573	0.030801	0.252196	-0.353043	1.0
9	-0.939712	0.044481	0.042731	0.496148	-1.767357	1.0